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The 1948 Selective Service Act established a process 
whereby all United States (US) military applicants take an 
aptitude test to measure their suitability for military job 
specialties. The latest version of these tests, the Armed 
Services Vocational Aptitude Battery (ASVAB), was introduced 
in 1968. Approximately 900,000 High School students from 
14,000 US High Schools take the ASVAB test each year. ‘This 
“paper and pencil” test requires the applicant to answer 
multiple choice questions (items) on a printed form. The 
creation of paper and pencil forms in one of the ten test 
topics is called form assembly. Form assembly consists of 
picking 20 to 35 items from an item pool of about 300 items 
such that: 1) each item appears on at most one form; 2) each 
form’s result represents the applicant’s capability; and 3) 
each form has the same level of difficulty. The thesis 
models the creation of paper and pencil forms as a mixed 
integer linear goal program and solves the problem both 
optimally and heuristically. Computational results for seven 
ASVAB-Tests show both methods help improve the form assembly 


process. 
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EXECUTIVE SUMMARY 


The 1948 Selective Service Act established a 
process whereby all United States military applicants take 
an aptitude test to measure their suitability for military 
job specialties. The latest version of these tests, the 
Armed Services Vocational Aptitude Battery (ASVAB), was 
introduced in 1968. Approximately 900,000 High School 
students from 14,000 US High Schools take the ASVAB test 
each year. This “paper and pencil” test requires’ the 
applicant to answer multiple choice questions (items) on a 
printed form. The Defense Manpower Data Center, as an 
executive agency for the ASVAB, is responsible for the 
design, development and creation of the tests. The creation 
of paper and pencil forms in one of the ten test topics is 
called form assembly. Form assembly consists of picking 20 
to 35 items from an item pool of about 300 items such that: 
1) each item appears on at most one form; 2) each form’s 
result represents the applicant’s capability; and 3) each 
form has the same level of difficulty. This thesis models 
the creation of paper and pencil forms as a mixed integer 
linear goal program. One approach solves the program using 
commercially available optimization software. A second ap- 
proach uses a local search with random restart heuristic. 
Both approaches yield good solutions. Computational results 
for the seven ASVAB-Tests show that combining both methods 
can improve the form assembly process. The Defense Manpower 


Data Center benefits from these computational results. 


pip 
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IL. INTRODUCTION 


The 1948 Selective Service Act established a process 
whereby all United States (US) military applicants take an 
aptitude test to measure their suitability for military job 
specialities. The latest version of these tests, the Armed 
Services Vocational Aptitude Battery (ASVAB), was introduced 
in 1968. A US Air Force Human Resources Laboratory study in 
1973 calculated cost avoidance from these tests at $76.8 
million per year for enlisted technical training [US Air 
Force Human Resources Laboratory 1973]. 

The ASVAB is currently given in about 14,000 US High 
Schools to about 900,000 potential applicants each year 
[Defense Manpower Data Center 1992]. This “paper and pencil” 
test requires the applicant to answer multiple choice 
questions (items). Each question has one correct answer that 
must be selected, on average, from a total of four choices. 
The ASVAB test consists of ten different areas of expertise. 
The categories — which have between 20 and 35 specific 
items each — are Arithmetic Reasoning (AR), Auto and Shop 
(AS), Coding Speed (CS), Electronics Information (EI), Ge- 
neral Science (GS), Mechanical Comprehension (MC), Mathe- 
matical Knowledge (MK), Numerical Operations (NO), Paragraph 
Comprehension (PC), and Word Knowledge (WK). 

The model developed in this thesis addresses only seven 
of the ten tests. The seven tests selected for use in the 
model’s development are selected because they are similarly 
structured. That is, these seven tests are configured in a 
manner which makes the choice of the next eligible item 
independent of the item chosen before. In other words, there 
is no dependency among items from the perspective of the 


form assembly process. 


The creation of paper and pencil forms for each cate- 
gory is called “form assembly.” Multiple forms must be 
created in each category so that all applicants are not 
tested using the same form. “Form assembly” consists of 
picking 20 to 35 items from a pool of about 300 items such 
that: 1) each item appears on at most one form; 2) each 
form’s result represents the applicant’s capability; and 3) 
each form has the same level of difficulty. The item pool 
itself can be split into several item groups, where each 
group, called a taxonomy, requires a certain number of items 
per form. 

This thesis models the creation of paper and pencil 


forms as a mixed integer linear goal program and solves the 


problem both optimally and heuristically. 


A. TEST THEORY BACKGROUND 


The measurement of a person’s ability or skill level 
(denoted ©) is commonly discretized into 100 intervals, so 
that each level can be expressed as a percentage. These 
intervals are then called percentiles of the ability. The 
skill level distribution over the potential applicant po- 
pulation is approximately normal allowing percentiles to be 
ranked from -30 to +36 around a mean. A _ reasonable 
assumption is that the probability p of answering an item 
correctly increases as the percentile increases with p ap- 
proaching 1 as the percentile goes to +30. Hence, this pro- 
bability can be represented by a logistic function, referred 
to aS an item response curve. A common model [Lord 1980] 
uses a three-parameter logistic function like the one 


adapted from Lord and Novick [1968] (Figure 1) with 


a eee 


p(O) = c + -1.7-a(©-b) 


1 +e 


Parameter a 1Sma proportaconality..factom forstshe sili@pe 
at the inflection point. It represents the discriminating 
power; in other words, how capable an item is to distinguish 
between applicants. Figure 2 shows an example where item 1 
has a steeper curve in the percentile range (50,60) than 
item 2 and therefore provides greater discrimination between 


individuals at percentiles 50 and 60. 


inf lection 





Figure 1: Parameters of the Logistic Function. 
The logistic function represents the probability of answering an 
item correctly and is defined with parameters (a, b and c). Parameter a 
is proportional to the slope at the inflection point: slope = .425a(1- 
c). Parameter b indicates an item’s difficulty level by defining the po- 
sition of an item’s curve along the ability scale 9. Parameter c indi- 
cates the guessing parameter [Lord 1980]. 


Parameter b indicates an item’s difficulty level by 
defining the position of an item’s curve along the ability 
scale 0 (i.e., when the percentile ©, corresponding to the 
probability of a correct answer is 0.5). 

Parameter c indicates the guessing parameter or the 


probability of answering an item correctly given an ability 





Parameter c indicates the guessing parameter or the 
probability of answering an item correctly given an ability 
falling greater than 30 below the mean [Lord 1980]. This 
guessing parameter does not necessarily reflect the pro- 
bability to select one correct answer from a certain number 


of possible choices. 


51 61 71 


percentiles 





Figure 2: Example of the Discriminating Power. 
Figure 2 provides an example of the discriminating power of two items 
for two applicants with percentiles 50 and 60. Item 1 has a steeper 
curve in the percentile range (50,60) than item 2 and therefore provides 
greater discrimination between individuals at percentiles 50 and 60. 


In practice, 1,000 to 10,000 applicants pretest an item 
and the parameters a, b and c are estimated from the re- 
sults. From the item response curve, an item information 
curve is determined (Figure 3). The item information curve 
describes the potential information contribution of an item 
to a test form at each percentile. These item information 
curves comprise the bulk of the data for this thesis. 

These item information curves are independent and ad- 
ditive when it is assumed that the information contribution 
of an item to the whole form does not depend on other items 


included on the form [Lord 1980]. Therefore all of a form’s 





item information curves can be added to get an overall in- 
formation curve. This overall information curve is commonly 


denoted as the precision of the form. 
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Figure 3: Item Information Curves. 
Figure 3 displays examples of different item information curves. These 
curves describe the potential information contribution of an item to a 


test form at each percentile. 


Empirical research and testing has produced a “re- 
ference curve” for each test representing the desired 
information distribution over a form’s percentiles. Since 
the establishment of a standard reference curve in 1980, 
some item pools have changed and it is now possible to 
provide forms with “better” information curves than the 
reference curve. In such cases, these curves are the new 
desired information distribution but cannot be called re- 
ference curves for historic purity. Regardless, in this 


thesis, we refer to the preferred curve as the “goal curve.” 








B. OUTLINE 


Chapter II provides information about research related 
to this thesis. Chapter III formulates the form assembly 
process as a mixed integer linear goal programming problem 
and discusses a heuristic to solve it. Chapter IV provides 
results obtained from solving the formulation using a 
heuristic and the General Algebraic Modeling System (GAMS) 
[Brooke, Kendrick and Meeraus 1992] with the solver OSL 
[GAMS 1995]. Chapter V compares the two solution methods and 


presents conclusions. 








Ii. RELATED RESEARCH 


The buUsk of “Ghee Irteraturewron ~aptrtude® and ability 
tests involves the concept of item validity [Lord 1980]. 
Validity in this case is taken to be the extent to which a 
test score actually predicts future performance. Toquarn, 
Corpe and Dunette [1991] review more than 10,000 articles 
related to validity as it pertains to ability tests. Their 
literature review highlights the Significant effort 
associated with this issue. As pertains specifically to the 
ASVAB, Maier and Truss [1985] give an example of that test’s 
predictability. In this study, the authors demonstrate that 
performance on the ASVAB tests is statistically related to 
training outcome measures of various US Marine Corps 
technical schools. 

The present study uses data provided by the Defense 
Manpower Data Center (DMDC). Again, as explained on page 
four, these data consist of roughly 300 item information 
curves, each curve derived by standard statistical pro- 
cedures [Lord and Novick 1968] from item response curves. 
These data are assumed to be representative with respect to 
the validity issue. Accordingly, the DMDC data used in the 
present study are used simply to demonstrate a methodo- 
logical approach to “form assembly.” They are not being used 
to demonstrate their predictive validity. 

Unlike the validity literature, there exist only a few 
publications addressing assembly or construction of ability 
or aptitude tests. Berger, Gupta and Berger [1988] present 
the construction of Form P for the Air Force Officer 
Qualifying Test (AFOQT). They develop two forms of the test 
by adding new items to an old form. The objective is to 


construct two new forms which are equivalent and parallel to 


the original form. “Equivalence” means that each form has 
the same information content. “Parallel” means that the 
outcome of the test is independent of the form the applicant 
has taken. Their approach is heuristic. The heuristic is 
straight forward. They select items with the most discrimi- 
nating power from the old form; check them against new 
items; and replace old items with new items that provide the 
best match; that is, a match which produces the smallest in- 
formation differences between the old and the new form. 

Baker and Wall [1996] use a form assembly similar to 
the heuristic approach presented in this thesis. They focus 
on a statistical analysis of the Interest Finder Test, a 
test to help students explore their occupational and career 
interests [DMDC 1992]. They describe form assembly as con- 
Sisting of two stages. The first stage screens the item pool 
and the second stage uses a heuristic algorithm to assign 
items to the form. Their heuristic selects an initial group 
of items and exchanges items when replacement considerations 
improve the form. The objective function is a weighted 
function that minimizes statistical differences between the 
current form and a desired form. These statistical dif- 
ferences are essentially the mean and standard deviation of 
scaling parameters for the test. The actual criteria for the 
initial item selection and resultS with respect to form 
assembly are beyond the scope of this paper. 

In summary, the literature review did not reveal prior 
attempts to use optimization in form assembly and only pro- 
vided scant references to the use of heuristic approaches. 
The next chapter discusses the optimization and heuristic 


approaches. 


IIL. OPTIMIZATION MODEL AND HEURISTIC 


A. OPTIMIZATION MODEL 


The form assembly problem can be formulated as a mixed 
integer linear goal programming problem (see Charnes and 
Cooper [1961] for a discussion of goal programming) con- 
Sisting of two goals. One goal is to assemble forms so each 
form’s information curve is as close as possible to the goal 
curve. The second goal is to make each form’s information 
curve as “parallel” as possible to one another. The 
“parallel” goal seeks an exam, where results are independent 
of the form the applicant has taken. An exam with all forms 
exactly matching the goal curve would simultaneously satisfy 
both goals but this is typically not possible. The parallel 
goal therefore encourages each form to be close to the goal 
curve. 

We implement the first goal by allowing the deviation 
from the reference curve to vary in groups where deviation 
within the group has the same penalty per unit and groups 
closer to the goal curve have a smaller penalty per unit. 
Figure 4 provides an example of the penalty groups. Any 
vertical deviation between the goal curve and form curve is 


penalized. 
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Figure 4: Penalty Groups. 

This figure displays at percentile 68 how deviation from the goal curve 
can be measured in different groups. The vertical distance AI would be 
penalized per unit with the penalty for group 1 for those units of AI 
within group 1 and with the penalty per unit for group 2 for those units 
of AI within group 2. Since it is desired to be as close to the goal 
curve as possible, group 1’s penalty per unit would be less than group 
2’S penalty per unit. 


The formulation follows. 


Indices: 


i: item from the item pool; 


p percentile (ability level) ; 

te form to be assembled (1,2,..,F); 

(e taxonomy (1,2,..,T); and 

g : penalty group. 

Data: 

CAT, the maximum deviation between a form and 
the goal curve in group g; 

ees information value of item i at percentile p; 


Ue) 


” PY ptg 
g 


>» TLY ptg 2 
g 


> %i = 
ne 


NITEM, the required number of items in taxonomy t; 

PARAWEI weight that combines the two goals; 

PENALTY, penalty per unit deviation within group g; 
and 

SHAPE, the information value for the goal curve at 
percentile p. 

Variables: 

Xi¢ 1, 1f£f item i is used on form f; 

PY peg deviation above the desired shape in group g 
at percentile p on form f; 

NY pte deviation below the desired shape in group g 
at percentile p on form f; 

Delplus; the total information form 1 contains that 
exceeds form f; and 

Delneg; the total information form f contains that 
exceeds form 1. 

Formulation: 

min en > EERE. - (Oy. tulle.) 

= 9 
+ PARAWET - > (Delplus, + Delneg;) (1) 


£52 


> INF, - xi, - SHAPE, Vewt (2p 
- INF, - x,, + SHAPE, ie, L ey, 
NITEM, Vite (4) 


Bel 


We 1 Vi (5) 


i 


SINE, - x, - > INF, - x, Vf>1 (6) 
Pp p 


2 a 


=eeeerDiuis. —- Delneg- 
OS jog ONE, Vp,f,g (7) 
OS NYpcg SSRs Vp; iss Ge 
X,- Dinary Vale 
Delplus;, Delneg; 2 0 Vie 


The first component of the objective function, 


De LT (DY eet ny...) 


p f g 
Minimizes the vertical distances (weighted deviation) bet- 
ween the goal curve and the assembled forms. The second 
component, 
PARAWEI - >, (Delplus, + Delneg,) 

encourages forms to Hee the same information. A second 
component having value zero does not necessarily imply 
parallel forms since the vertical distances at percentile p 
from form 1 to form f can have positive or negative signs 
depending on whether form f is above or below form 1. These 
positive and negative distances can sum up to zero producing 
two forms where Delplus; = Delneg; = 0. Nevertheless, the 
second component has empirically produced parallel forms and 
requires only F-1 additional constraints. Constraints (2) 
and (3) determine the positive and negative deviation at 
each percentile between the assembled forms and the goal 


curve. Constraint (4) ensures the required number of items 


eZ 


per taxonomy is satisfied. Constraint (5) ensures that each 
item is used at most once. Constraint (6) determines the 
total information difference between form 1 and other forms. 
Constraints (7) and (8) bound the positive and negative de- 


viations. 
i, HEURISTIC APPROACH 


Solving the previous problem optimally has’ taken 
extensive computation time as shown in the next chapter. To 
provide solutions quickly a local search with random restart 
heuristic (e.g., [Papadimitriou and Steiglitz 1982]) is de- 
veloped. 

The main objectives for the heuristic are to quickly 
complete one assembly and to quickly evaluate small 
Variations to the assembly. The heuristic uses only integer 
arithmetic within efficient code to help improve  per- 
formance. 

The heuristic starts by dividing the item pool into 
arrays of items where each array corresponds to a taxonomy. 
These sub-item pools are eligible sets (ES,) for each 
taxonomy. 

Fach form consists of vectors for each taxonomy 
(Assign,,). The algorithm consists of three main procedures 
(Pigure 5): fill_initial_ form; do_swap; and improve pa- 
mai lel. 

Figure 6 displays the pseudocode for the procedure 
fili_ initial_forms. A random number generator [Lewis, 
Goodmann and Miller 1969] is used to assemble the initial 


forms subject to all constraints. 
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Figure 5: Main Procedures of the Heuristic. 
This figure shows the main procedures for the heuristic algorithm. A 
loop over one assembly of all forms runs as often as the user has 
chosen. The best assembly is the result. 


1 Assign,; <@; initialize ES, (assume |ES,|2F*NITEM,} 
ZoaeOeen = 1 tO F 

3 LOust Saw. tO LP 

4 while l|Assign,.| < NITEM, 

5 randomly select item i from ES, 

6 

i 

8 

9 


Assign.; < Assign,, U {i} 
ES, <« ES, - {i} 
end 
end 
10 end 





Bgl eemo mie “PecudoccocessOr the Procedure fill inztial fomnse 
This figure shows how the heuristic randomly assembles the initial 
forms. The indices and variables match those from the optimization 
moagel. Assign,;, contains items on form f in taxonomy t. ES, contains all 


items in taxonomy t not currently used on any form. 


The procedure do swap defines a swap as the exchange of 


an item from a form (i,,4 € Assign.;) with an item from the 


14 


appropriate eligible set (i,, ¢€ ES,). Figure 7 shows the 


pseudocode for this procedure. 


1 improve < 1 

2 while improve > 0 

S improve < 0 

4 fomGte— 1 EOm 

5 fer £ = 1 #o 2 

6 for each item i, € ASSign,; 

7 sofar < ObjFctValue old 

8 Assign,; < Assign,., - {ig} 

9 for each item (i,;,) € ES, 

10 Assign., < Assign,; + {1i;,} 
: ial calculate ObjFctVal_new 
sie if ObjFctVal new, < sofar (improvement) 
dig sofar < ObjFctVal_ new 
14 candidate = i,, 

5 end if 

16 Assign,; < Assign,; - {i,,} 
LM end 

18 if sofar < ObjFctValue old 
ig, Swap candidate with iv. 
20 update involved curves 

21 improve < improve +1 

22 end if 

ge) end 


end 
end 


end while 





Figure 7: The™Pseudocode for the Procedure do Swap. 
This figure shows how items swapping improves forms. ObjFctValue old is 
the sum of all deviation between form f£ and the goal curve before 
potentially swapping an item and ObjFct_new is after a potential swap. 
The procedure repeats until no swap yields a decrease to the objective 


Pum@etion Of any form. 


The objective function value measuring the effectiveness of 


the swap is the sum of all deviations between form f and the 


i 


goal curve. Improvement, as it is used in this context means 
a decrease of the objective function value, caused by swap- 
ping an item. This procedure runs through all forms and 
eligible sets and checks whether a swap yields improvement. 
The while-loop repeats as long as at least one improvement 
is found across all forms and eligible sets. 

To increase the speed of the algorithm a baseline for 
checking the swaps is used. A baseline in this context is 
the sum of all item information curves currently assembled 
without the item considered for exchange (1i,,). Within the 
pseudocode of Figure 7, the baseline can be calculated after 
step 8; and doing so reduces the computational effort needed 
to determine the new objective function value in step 11. 
Only the 100 information values of item i,, have to be added 
to the baseline instead of summing over all items currently 
assigned. The swap is executed after all items of the 
eligible set have been examined with that item that gives 
the most improvement (candidate). 

The procedure improve_parallel checks if swapping items 
between forms can improve the forms. The procedure starts by 
finding the form with the smallest sum of all deviations 
from the goal curve sofar. This best form is the one with 
which the other forms have to be aligned. Figure 8 displays 
the pseudocode for the procedure improve parallel. At this 
stadium, the heuristic does not allow the objective function 
to increase. 

An improving swap between forms happens only after all 
items within a taxonomy on all forms have been compared with 
an item on the best form. The calculation of the curves uses 
the baseline principle again. Improve parallel terminates 


when no item is swapped on any form. 
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improve < 1 

while improve > 0 
improve < 0 
find best form f_ best 


= Leaeo@ | 


for each item i,k «€ ASS1igNe¢ pest 

ASSiQNe¢ pest cS ASS1igNee best oe eee 

for £ = 1 to F excluding £ best 
sofar < (ObjFctValue; + ObjFctValue; p.6+) cig 
For each item i,, € Assign,; ; 


Bee Neripente eer co aiee pest + 1 Laat 
Assignys — Assignes = oleh 5 eo 
calculate ObjFctValues 
betrem: <— (Obj PetValuer +Oby ict Valve = 
if better? < sofar then improvement 
sofar < better? 


candidatein = i;, 
candidateout = int 
end if 
Assign,; < Assign,; + {iin} - {ice} 
ASSigNe¢ best Sa ASS1igNes best = ey, 
end 


end 
tf Sefar < (Obj Pet Value; 40b)FctValucr ay er 


Swap candidates 
update involved curves 
improve <- improve +1 


end if 


end while 


Figure 8: The Pseudocode for the Procedure improve_parallel. 
This figure shows swaps allowed between forms. A swap, given it improves 
the objective function value, occurs after one item on the best form has 


been compared with all other assigned items on the other forms. 
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IV. COMPUTATIONAL RESULTS 


The task is to assemble forms for seven different 
tests: Arithmetic Reasoning (AR), Auto and Shop (AS), 
Electronics Information (Er), General Science (GS), 
Mechanical Comprehension (MC), Mathematical Knowledge (MK), 
and Word Knowledge (WK). Table 1 lists the test speci- 


mrecations. 


Test Item Pool size Forms Items on form Taxonomies 
needed 
5 


a. = 





Table 1: Gest Regulzementss andeditem Pools. 
This table lists the specifications for each of the tests. For example, 
the AR-Test requires the creation of two forms each having 30 items. The 
30 items, falling into five taxonomies, must be selected from an item 
pool of 338 items. 


A. OPTIMIZATION PARAMETER SETTINGS 


The optimization model formulated in the previous 
chapter requires the specification of a number of para- 
meters. A summary sheet for each test contains results as 
well as parameter settings. We use the AR-Test as an 


example. 


aS 


Figure 9 shows the implemented objective function. All 
values were empirically developed. The penalties for the 
unbounded variables, py4 and ny4, are 100. Other values are: 
CAT, Ow01; CAT. = 0.05; CAT, = 0.10; 
penalty, = 0.00001; penalty, = 1.00; penalty, 5.00; and 
PARAWEI = 25. 


( » » POOR yo tee nly4 
f P 


ee 00 OO ey, + eepy2 - Gt seuePyee: 
+ Cae OCOh myi teiwaemye . + 5 -ny3-.} 
+ 25 * ¥ (Delplus, - Delneg,) 


= 





Figure 9: The objective function parameters for the optimization model: 
This figure shows the objective function implemented in GAMS for the AR- 
Test. It measures the overall distance between the forms and the "eéal 
curve at each percentile. The pys and nys are the deviation variables. 
25 * X(Delplus - Delneg) is the subgoal to encourage parallel forms. 


We use only upper bounds on the deviation variables 
(CAT,) for groups 1, 2 and 3. The following pages display 
for each test the bounds for the penalty groups and the 
weights for the subgoal. 


B. OPTIMIZATION RESULTS 


This section shows results for the assembled tests. The 
integrality gap provided is the difference between the best 
integer solution identified and a lower bound on the 
solution, expressed aS a percentage of the lower bound. The 


results for all tests are presented in alphabetical order. 
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Table 2 summarizes the numerical results obtained. Figures 

















objfctvalue objfctval best integrality gap runtime 













10 to 16 show graphical results. 
(seconds) 


pan | sss.7 | 932.33 | es 5,260 
[as | 2,708.00 | 2e6a.73 | as ats 
a 

GS 

MC 

MK 

WK 


lower bound (@) solution (@) 






; : ' : gal 2a5 

17 
cs | 8,095.66 | 6,433.03 | ae | 
Pomc | azs.oe | ae7.a2 | 850.0 | 50,000 
pom | 2,006.72 | 7,278.28 | 260.0 | 50,000 _ 
[owe | 3.sea32 | s2eee2 | 9.2 | 33,934 


Table 2: Numerical Results of the Optimization Assembiy. 
Table 2 summarizes all numerical results for tests assembled using 
optimization, where objfctvalue = Objective Function Value. The inte- 
grality gap provided is the difference between the best integer solution 
identified and a lower bound on the solution, expressed as a percentage 
of the lower bound (e.g., @=(@-@) /@). 


Model results come from an IBM RS6000 Model 590 
workstation using GAMS and the OSL solver. The model size 
varies, primarily according to the number of forms and the 
cardinality of the item pool. The approximate size of the 


largest model, MK-Test, is shown below: 


number of constraints: 1, 250;; 
number of continuous variables: 4,500; 
number of binary variables: 17300; and 
number of non-zero elements: 2507 000% 
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AR - Test (Arithmetic Reasoning) : 


General Requirements: 
fomms: 2; 
items: 30 each; and 
taxonomies: 5 (7,8,5,5,5 items in taxonomy 1 to 5). 
Settings: 
CAT-values: 0.01, 0.05, 0.1; 
penalties: 0.00001, 1, 5; 
PARAWEI: 25; and 
item pool: 338 items. 
Numerical Results: 
objective function value (lower bound) : 865.97; 
objective function value (best solution): 932.33; 
integrality gap: 7.6 %; and 
runtime (seconds): 15,260 (4.2 hours). 


Graphical Results: Figure 10 below. 
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Figure 10: Graphical Results for the AR-Test. 
This figure shows results obtained for the AR-Test with information on 
the vertical axis and the percentiles on the horizontal axis. Form 1 and 


form 2 are the information curves for each form. 
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AS - Test (Auto and Shop): 


General Requirements: 
forms: 2; 
items: 25 each; and 
taxonomies: 2 (11, 13 items in taxonomy 1 and 2). 
Settings: 
CAT-values: 0.05, 0.1, Q.5; 
penalties: 0.00001, 1, 5; 
PARAWEI: 25; and 
item pool: 196 items. 
Numerical Results: 
objective function value (lower bound) : 2,788.00; 
objective function value (best solution): 2,862.73; 
integrality gap: 2.7 %; and 


runtime (seconds): 215. 


Graphical Results: Figure 11 below. 





Figure 11: Graphical Results for the AS-Test. 
This figure shows results obtained for the AS-Test with information on 
the vertical axis and the percentiles on the horizontal axis. Form 1 and 
form 2 are the information curves for each form. 
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EI - Test (Electronics Information) : 


General Requirements: 
femms: 2; 
items: 20 each; and 
taxonomies: 4 (10,4,2,4 items in taxonomy 1 to 4). 
Settings: 
CAT-values: 0.05, 0.1, 0.7; 
penalties: 0.00001, 1, 10; 
PARAWEI: 3; and 
item pool: 190 items. 
Numerical Results: 
objective function value (lower bound) : 9,489.65; 
objective function value (best solution): 9,561.94; 
integrality gap: 1.0 %; and 
runtime (seconds): 17. 
Graphical Results: Figure 12 below. 


1 0 tI Nar OR ESTE Res NI RE PI Ra a RR EDL Sg A i a Ag TR ES Rey SI A A ee es mee ea ee em, ag 


8 
if 
6 
5 
4 
3 
2 
1 
0. 


1-6 11°16 21 26 31 36 41 46 51 56 61°66 71 /6 61) e653. 





Figure 12: Graphical Results for the EI-Test. 
This figure shows results obtained for the EI-Test with information on 
the vertical axis and the percentiles on the horizontal axis. Form 1 and 
form 2 are the information curves for each form. 
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GS - Test (General Science): 


General Requirements: 
forms: 2; 
items: 25 each; and 
faxonemmes: 12 °(3,3,4,2,2,3,1,;2,2,1,1,1). 
Settings: 
CAT-values: 0.05, Q.1, 0O.5; 
penalties: 1, 10, 100; 
PARAWEI: 100; and 
item pool: 313 items. 
Numerical Results: 
objective function value (lower bound) : 8,095.66; 
objective function value (best solution): 8,433.03; 
integrality gap: 4.2 %; and 
runtime (seconds): 312. 


Graphical Results: Figure 13 below. 
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Figure 13: Graphical Results for GS-Test. 
This figure shows results obtained for the GS-Test with information on 
the vertical axis and the percentiles on the horizontal axis. Form 1 and 
form 2 are the information curves for each form. 
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MC - Test (Mechanical Comprehension): 


General Requirements: 
forms: 2; 
items: 25 each; and 
taxonomies: 6 (11,2,2,2,4,4 items in taxonomy 1 to 6). 
Settings: 
CAT-values: 0.01, 0.05, O.1; 
penalties: 0.00001, 1, 5; 
PARAWEI: 300; and 
item pool: 296 items. 
Numerical Results: 
objective function value (lower bound) : 125.04; 
objective function value (best solution): 1,187.83; 
integrality gap: 850 %; and 
runtime (seconds): 50,000 (13.8 hours). 


Graphical Results: Figure 14 below. 
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Figure 14: Graphical Results for the MC-Test. 
This figure shows results obtained for the MC-Test with information on 
the vertical axis and the percentiles on the horizontal axis. Form 1 and 
form 2 are the information curves for each form. 
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MK - Test (Mathematical Knowledge) ; 


General Requirements: 
forms: 4; 
items: 25 each; and 
taxonomies: 5 (3,5,9,7,1 items in taxonomy 1 to 5). 
Settings: 
CAT-values: 0.05, 0.1, O.5; 
penalties: 1, 10, 100; 
PARAWEI: 300; and 
item pool: 327 items. 
Numerical Results: 
objective function value (lower bound) : 2 00G< 7A: 
objective function value (best solution): 7,278.24; 
integrality gap: 7.3 %; and 
runtime (seconds): 50,000 (13.8 hours). 


Graphical Results: Figure 15 below. 
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Figure 15: Graphical Results for the MK-Test. 
This figure shows results obtained for the MK-Test with information on 
the vertical axis and the percentiles on the horizontal axis. Form 1 to 
form 4 are the information curves for each form. 
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WK - Test (Word Knowledge) : 


General Requirements: 
forms: 2; 
items: 25 each; and 
taxonomies: 2 (13,22 items in taxonomy 1 and 2). 
Settings: 
CAT-values: 0.01, 0.05, 0.1; 
penalties: 0.000001, 1, 5; 
PARAWEI: 500; and 
item pool: 276 items. 
Numerical Results: 
objective function value (lower bound) : 57 0 Ceo 
objective function value (best solution): 5,188.42; 
integrality gap: 39.2 %; and 
runtime (seconds): 13,934 (3.9 hours) 
Graphical Results: Figure 16 below. 
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Figure 16: Graphical Results for the WK-Test. 
This figure shows results obtained for the WK-Test with information on 
the vertical axis and the percentiles on the horizontal axis. Form 1 and 


form 2 are the information curves for each form. 
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C. RESULTS OF THE HEURISTIC APPROACH 


The objective function implemented in the heuristic is 


as follows: 
2 2 (PY pe + Gye) 
4 


This simplification of the objective function previously 
used (i.e., unweighted deviations and no parallel subgoal) 
was chosen for ease of computation. 

The following pages display the objective function 
values per repetition (random restart) of the heuristic as 
well as the graph for the best solution found (Figures 17 to 
Bg) . 

The heuristic algorithm is implemented on a Pentium 166 
PC, written in Standard Pascal [e.g., Silicon Valley Soft- 
ware 1991]. Table 3 shows the runtimes and the objective 


function values. 


Runtime 


(seconds) 


257.94 
280.68 160 


Table 3 : Results for tests assembled with the Heuristic Approach. 
As the runtimes show, the heuristic provides results very quickly. 
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AR - Test (Arithmetic Reasoning) : 


General Requirements: 

forms: 2; 

items: 30 each; and 

taxonomies: 5 (7,8,5,5,5 items in taxonomy 1 to 5); 
Execution Specifics: 

repetitions: 100; and 

objective function value: 97.74. 
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Figure 17: Objective Function Values for each Random Restart. 


The flat line indicates the minimum value. 





Figure 18: Graphical Results for the AR-Test. 
This figure shows results obtained for the AR-Test with information on 
the vertical axis and the percentiles on the horizontal axis. Form 1 and 


form 2 are the information curves for each form. 
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AS - Test (Auto and Shop): 


General Requirements: 

forms: 2; 

items: 25 each; and 

taxonomies: 2 (11, 13 items in taxonomy 1 and 2). 
Execution Specifics: 

repetitions: 100; and 

objective function value: 230.77. 
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Figure 19: Objective function values for each Random Restart. 
The flat line indicates the minimum value of the best solution obtained. 
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Figure 20: Graphical Results for the AS-Test. 
This figure shows results obtained for the AS-Test with information on 
the vertical axis and the percentiles on the horizontal axis. Form 1 and 


form 2 are the information curves for each form. 
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EI - Test (Electronic Information) : 


General Requirements: 


forms: 2; 

items: 20 each; and 

taxonomies: 4 (10,4,2,4 items in taxonomy 1 to 4) 
Execution Specifics: 

repetitions: 100; and 


objective function value: 227.10. 
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Figure 21: Objective Function Values for each Random Restart. 
The flat line indicates the minimum value of the best solution obtained. 
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Figure 22: Graphical Results for the EI-Test. 
This figure shows results obtained for the EI-Test with information on 
the vertical axis and the percentiles on the horizontal axis. Form 1 and 


form 2 are the information curves for each form. 
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GS - Test (General Science): 


General Requirements: 

forms: 2; 

items: 25 each; and 

taxonemmes: 12 (3,3,4;,2,2,3,21,/272,1,1,1). 
Execution Specifics: 

repetitions: 100; and 


objective function value: 117.18. 
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Figure 23: Objective Function Values for each Random Restart. 
The flat line indicates the minimum value of the best solution obtained. 
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Figure 24: Graphical Results for the GS-Test. 
This figure shows results obtained for the GS-Test with information on 
the vertical axis and the percentiles on the horizontal axis. Form 1 and 
form 2 are the information curves for each form. 
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MC - Test (Mechanical Comprehension) : 


General Requirements: 

forms: 4; 

items: 25 each; and 

taxonomies: 6 (11,2,2,2,4,4 items in taxonomy 1 to 6). 
Execution Specifics: 

repetitions: 100; and 

objective function value: 47.8. 
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Figure 25: Objective Function Values for each Random Restart. 
The flat line indicates the minimum value of the best solution obtained. 
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Figure 26: Graphical Results for the MC-Test. 
This figure shows results obtained for the MC-Test with information on 
the vertical axis and the percentiles on the horizontal axis. Form 1 to 


form 4 are the information curves for each form. 
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MK - Test (Mathematical Knowledge) : 


General Requirements: 

forms: 4; 

items: 25 each; and 

taxonomies: 5 (3,5,9,7,1 items in taxonomy 1 to 5). 
Execution Specifics: 

repetitions: 100; and 


objective function value: 257.94. 
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Figure 27: Objective Function Values for each Random Restart. 
The flat line indicates the minimum value of the best solution obtained. 
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Figure 28: Graphical Results for the MK-Test. 
This figure shows results obtained for the MK-Test with information on 
the vertical axis and the percentiles on the horizontal axis. Form 1 to 


form 4 are the information curves for each form. 
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WK - Test (Word Knowledge) : 


General Requirements: 

forms: 2; 

items: 35 each; and 

taxonomies: 2 (13,22 items in taxonomy 1 and 2). 
Execution Specifics: 

repetitions: 100; and 


objective function value: 280.68. 
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Figure 29: Objective Function Values for each Random Restart. 
The flat line indicates the minimum value of the best solution obtained. 
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Figure 30: Graphical Results for the WK-Test. 
This figure shows results obtained for the WK-Test with information on 
the vertical axis and the percentiles on the horizontal axis. Form 1 and 
form 2 are the information curves for each form. 
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D. DISCUSSION OF THE RESULTS 


The optimization approach yields good results for the 
form assembly. The assembled forms for five out of the seven 
tests have information curves (form curves) that are very 
close to the goal curve and parallel to each other. In the 
EI- and WK-Test the form curves do not reach the goal curve 
in the lower half of the percentile range. Improving these 
forms by changing the weight of the parallel subgoal for the 
EI- and WkK-Test to zero does not improve the shape of the 
form curves. Increasing the weight for the subgoal yields 
marginally more parallel forms, but increases the overall 
distance to the goal curve much more. Changing the bounds 
for the deviation variables has little effect. Discussions 
with DMDC indicate the item pools for the EI- and WK-Test 
are known to be “weak” since in their opinion, too many 
items were extracted for Computer Adaptive Testing. (See 
Wainer [1990] for a description of this relatively new 
method of testing.) They are working to restock these item 
pools. 

The heuristic yields good results for the AR-, AS- and 
MC-Test. Results for GS-Test are not very parallel in the 
higher percentile range and results for the MK-Test are not 
very parallel in the lower percentile range. The form curves 
of the EI- and WK-Tests indicate the same deficiency in the 
item pool in the lower half of the percentile range as 
mentioned above. The number of repetitions has been in- 
creased to 1,000 in the AS- and EI-Test in order to see, 
whether the heuristic results can be improved. The objective 
function value decreased from 230 to 225 in the AS-Test and 
only from 227 to 226 in the EI-Test. 


By) 
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V. USING BOTH OPTIMIZATION AND HEURISTIC APPROACHES 


A. USING THE HEURISTIC SOLUTION AS A BOUND 


Table 4 summarizes a direct comparison of the objective 


function values where the heuristic solutions are converted 










Optimization Objective Heuristic Objective 





function value function value 


to the objective function of the optimization model. 
932338 1,840.57 
2,oue .73 MS eh gis 













eam 840. 


Table 4: Comparison of the Results. 





This table provides the objective function values for both the best 
heuristic solution and the best solution obtained solving the opti- 
mization model using the optimization model’s objective function. 


The optimization approach yields smaller objective function 
values than the heuristic as would be expected when using 
the optimization model’s objective function as an eva- 
luation. However, it 1s surprising that the differences are 
sO great when the graphical results look similar. For the 
AR-Test, the heuristic approach in the percentile range 20 
to 50 is not as parallel as in the optimization solution and 


this difference is responsible for nearly doubling the 
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objective function value. This is similar in the AS-Test, 
where form 2 is constantly below form 1. For the MK- and WK- 
Tests the corresponding objective function value is 9.9 and 
3.5 times higher than the optimal result. The heuristic’s 
solution having a higher objective function value for the 
MK-Test (Figure 15 and Figure 28) is caused by the parallel 
gap between form one and the three other forms in the lower 
percentiles combined with a high weight for the parallel 
subgoal. In the WK-Test the alternating behavior of the 
forms around each other in Figure 16 is similar to the 
heuristic solution (Figure 30). However, there is an obvious 
dominance of form one to form two in the lower percentile 
range. The heuristic solution for the MC-Test has a higher 
value than that of the optimization solution, however, the 
graphical result of the heuristic looks much better than the 
optimization. This is most likely due to the cancellation 
effect of positive and negative distances in Figure 14. 
Using the heuristic solution as an upper bound for the 
objective function value when solving it using GAMS and OSL 
yields better results in almost all cases as shown in Table 
5. Table 5 shows the MC-Test is an exception since the best 
solution with the heuristic bound is worse than without it. 
While this may happen due to OSL’s branching choice within 
its branch and bound enumeration, having a bound should help 


in almost all cases. 


40 















Sb 2ee7alue Objfictvalue Change of inte- Change of 


















(unbounded) (bounded runtime (sec) 


) ) 
pas | 2,862.73 | 2,809.90 | -2.9 | + 642 
[sr | s,ser.c¢ | ose2.20 | 04 | + a0 
fos | 5,433.03 | eaea.on | oo |g 
Pomc | a,67.22 | 2,605.23 | sta30.0 | 
pm | 7270.26 | 6,532.59 | 30.0 |=, 567 
Lime | s2ee42 | 5,227.66 | 


Table 5: Results of the Optimization Starting with the Best Heuristic 





Soluevone 
This table shows a comparison of the results for the optimization ap- 
proach, when the heuristic solution bounds the objective function. A 


negative number indicates an improvement in time or in the integrality 


gap. 
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B. CONCLUSIONS 


This thesis demonstrates how using a linear mixed 
integer goal program can support DMDC’s form assembly 
process. The developed heuristic is a good supplement that 
can be used with the optimization approach described. In 
some cases the heuristic solution yields good upper bounds 


for the optimization that can decrease the computation time. 
C. RECOMMENDATIONS 


The optimization model should be extended to capture 
the other three ASVAB-Tests. 

This heuristic algorithm should be considered a pro- 
totype. Experiments should be conducted with the objective 
punceHeny tO find the most useful expression. «While changing 
the objective function to match that currently implemented 
in the optimization model would be a natural first step, 
experimentation should be more expansive. The heuristic can 
easily accomodate a nonlinear objective function (an option 
not available in integer linear programming). 

Further research can also be conducted to implement a 


heuristic for Computer Adaptive Testing. 
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